Title: Haskell Magic: Model checking made simple, part 1
Date: 2018-10-01
Modified: 2018-10-01
Category: Blog
Slug: haskell-magic-pl-interpreter
Tags: haskell-magic, haskell
Summary: Where I over-engineer the first part of a homework assignment

As some of you know, I'm a PhD candidate, which involves doing teaching
assistant work. One of the classes I'm working with involves a bunch of model
checking, based on [this text here][1]. As this (re-)uses a whole lot of
material from logic, automata and algorithm design and analysis, this is
something I found fairly interesting, despite not really knowing much about it. 

Just today, I got asked about one of the assignments, which requires
implementing a way of checking invariant properties of a state graph
(definitions coming, I promise). This led to me demonstrating how you can do
(some of) this is Haskell, and how concise, clear and simple it is. Given that
the majority of my students haven't even _heard of_, much less seen, any Haskell
(our university is a Microsoft and Java shop, in all the worst ways), this
elicited oohs and ahs of amazement. Based on this, I decided I'd try and do
(something resembling) what they are being asked to do, in a series of blog
posts to fulfil my birthday condition. If anyone from my class is reading - I
hope this helps you learn Haskell, but don't expect to copy my code and get away
with it.

Without further ado, let's begin.

## Some basic ideas ##

[_Model checking_][2] is a broad field involving formal verification of the
properties of certain systems. A lot of the time, these systems involve hardware
or software, but doesn't have to. The process of model checking usually goes 
something like this:

  1. Construct a _model_ of the system you want to check.
  2. Write a _property_ describing the thing you want to reason about regarding
     this system.
  3. _Check_ it to see if the model entails this property.

Each of these steps can be approached in a wide variety of ways, many of which
are borrowed from other fields. In particular, automata and logic come up
frequently, and given that we want this checking to be doable by machine,
algorithms and data structures are also quite important. 

In this particular case, we are dealing with a _state graph_. This is a general 
representation of a system which can change states - that is, something
where, over time, given certain actions, different effects can occur. More
formally, a state graph is a 5-tuple _(S, I, E, P, L)_ where
 
  * _S_ is a finite set of _states_
  * _I_ is a subset of _initial states_ of _S_
  * _E_ is a set of _edges_ from states to other states
  * _P_ is a finite set of _propositions_
  * _L_ is a _labelling_ of each state in _S_ with some (possibly empty) set of
    propositions in _P_

For a (visual) example, consider the diagram below:

<img alt="A diamond-shaped graph of circular nodes. The topmost node, labelled
'{}' has arrows going from it to a node on the left, labelled '{p}', and a node
on the right, labelled '{q}', creating a fork. Both the left node and the right
node have arrows going from them to a bottom node, labelled '{p,q}'. The bottom
node also has an arrow pointing back to the left node, labelled '{p}'. Finally,
an arrow above a diamond points from empty space above the topmost node to the
topmost node." src="{static}/images/state-graph.svg"/>

This state graph has four states, each labelled with their set of labels.
There's one initial state (the one with the arrow coming into it 'from
nowhere'), which happens to be labelled with the empty set. Arrows from states
to states indicate edges, which have a direction indicated by the arrowhead.
This state graph represents a process which may be in one of four different
'states', with different effects in each one, represented by the labelling.

Properties of state graphs are stated in terms of the elements of _P_, and can
be either true or false. For the state graph above, 'at some point, _p_ in every
state' is a property (which happens to be true); 'at any point, _q_ will be true
in the next state' is also a property (which happens to be false). Given enough
propositions, and a complex enough state graph, we can describe some extremely
sophisticated properties. Additionally, if we can state these properties in a
certain way, depending on the property, we can mechanically check if a given
state graph possesses that property or not. This is essential, as in any real
cases, state graphs can be incredibly large (millions of states isn't unusual),
and the resulting properties can be extremely complex. 

In the context of this assignment, the properties of interest are _invariants_,
which are properties of a 'global' nature for the entire state graph. An example
of an invariant for the state graph above would be '_p_ is true in every state'
(which is, of course, false); another would be 'in any state, if _p_ is true,
_q_ is false' (also false, as it happens). Invariants are the simplest
properties, both in terms of checking _and_ description: for the former, it's
enough to check that the property holds in each state individually (using some
kind of graph traversal); for the latter, we can use propositional logic as our
description language (which also makes checking easy). The assignment
specifically asks for the ability to:

  1. Take a description of a state graph and an invariant property as input.
  2. Construct both.
  3. Check if the given invariant property holds.

For this part of the series, I will describe propositional logic, write a simple
interpreter for it, and then _massively over-engineer_ it. Because without that
last step, this would be a highly dull blog post. We'll continue the
over-engineering in the second part, where we veer dangerously into compiler
design territory; then in the third and fourth parts, we'll design the rest.

## Propositional logic for the masses ##

First-off, a disclaimer: the following description is not, in any way, shape or
form, meant to be a definitive one. Much more thorough treatments of this topic
[can be found elsewhere][3], and even [Wikipedia][4] is OK for detail purposes.
I'm only going to give enough information to explain what I'm about to do and
why we're writing the Haskell we're writing. If you happen to be a logic expert,
this will probably be laughably simplistic for you (and you can likely teach me
this course, in all likelihood) - consider yourselves warned.

Propositional logic is basically the simplest kind of logic we can have that's
still interesting. The 'objects' that we work with in propositional logic are
_propositions_ - statements which can either be true or false. 'It is sunny
outside' is an example of a proposition; another one is 'Yesterday I ate a
hamburger for breakfast'. To keep the notation simple and precise, we usually
refer to propositions using symbols _p~1~, p~2~_ and so on. At any given time,
we're only going to need a finite number of these, which we'll collect together
into a big set _P_, which we can think of as the 'universe' of things we
currently care about.

Now, if we could only deal with propositions alone, we wouldn't have a terribly 
interesting system. Thus, we also have _operators_, which can take propositions
and construct other propositions. We can talk about the _arity_ of an operator,
which refers to how many propositions it 'consumes'; in our case, we have only
_unary_ ('consumes' one) and _binary_ ('consumes' two) operators. In particular,
we have the unary operator ∼ (pronounced 'not'), and the following binary
operators:
  
  * ∧ ('and')
  * ∨ ('or')
  * → ('if')
  * ↔ ('iff', short for 'if and only if')

We write 'not' prefix (thus, before whatever it's operating on), and all the
binary operators infix. To avoid ambiguity, we'll also use parentheses for 
grouping, just like with algebra. Our operators can nest inside each other 
as much as we wantThese can nest inside each other arbitrarily - 
_p~1~_ ∧ _(p~2~_ ∨ _p~3~)_ is fine. 

Obviously though, we can't just put together propositions and operators any old
how - we have to have some rules. These are called the _syntax_ of propositional
logic; anything which conforms to them is called a _well-formed formula_ (or
_wff_, pronounced 'woof'). Typically, if we're talking about arbitrary wffs,
we'll call them Φ and Ψ, possibly with subscripts. To spell out our syntax
precisely, we have the following rules:

  1. Any proposition _p_ on its own is a wff.
  2. If Φ is a wff, then (∼Φ) is a wff.
  3. If Φ and Ψ are both wffs, then (Φ ∧ Ψ), (Φ ∨ Ψ), (Φ → Ψ) and (Φ ↔ Ψ) are 
     all wffs.
  4. Nothing else is a wff.

We'll drop parentheses for 'obvious' cases in writing, but you can assume that
they're all there. Based on the rules above, we can represent a wff as a tree -
see the example below for the tree corresponding to _p~1~_ ∧ _(p~2~_ ∨ _p~3~)_.

<img alt="A binary tree, whose root is labelled '∧', with two children. Its left 
child is another binary tree, whose root is labelled '∨'. Its right child is a 
leaf labelled 'p1'. The left child of the left subtree is a leaf labelled 'p2', 
and the right child of that same subtree is a leaf labelled 'p3'." 
src="{static}/images/parse-tree.svg"/>

We call such a tree a _syntax tree_.

So far, everything we've talked about has been to do with _form_ - we haven't
given any meaning to anything yet. In the context of a _logic_, meaning is to do
with truth - and in propositional logic, truth is binary. We define two symbols,
representing truth and falsehood respectively; in our case, they'll be ``True``
and ``False``. _Every_ wff is one or the other - to determine which is the job
of our _semantics_, which is what gives meaning to our logic. Before we can talk
about the meaning of operators, let's first think about propositions. In order
for us to have meaning in our propositions, we have to specify what the truth
value of each proposition in _P_ is. We call this an _interpretation_, and we
can think of it as a function that takes propositions and emits ``True`` or
``False``.

Armed with some interpretation _I_, we can now define the _truth value_ of any
wff.  

  1. If a wff is a single proposition _p_, its truth value is _I(p)_.
  2. If a wff has the form (∼Φ), its truth value is ``True`` if the truth value
     of Φ is ``False``; otherwise, its truth value is ``False``.
  3. If a wff has the form (Φ ∧ Ψ), its truth value is ``False`` unless the
     truth value of _both_ Φ and Ψ is ``True``, in which case it is ``True``.
  4. If a wff has the form (Φ ∨ Ψ), its truth value is ``True`` unless the truth
     value of _both_ Φ and Ψ is ``False``, in which case it is ``False``.
  5. If a wff has the form (Φ → Ψ), its truth value is ``True`` unless the truth
     value of Φ is ``True`` and Ψ is ``False``; in that case, it's ``False``.
  6. If a wff has the form  (Φ ↔ Ψ), its truth value is ``True`` if the truth
     values of Φ and Ψ match; otherwise, it is ``False``.

Looking at this from the point of view of syntax trees, you can think of this
process as a 'collapsing' of the tree bottom-up using _I_ to process the leaves.
We can then repeatedly apply the rules given here until we reach the root, which
will give us the truth value of the whole formula.

How does this relate to invariant checking? Given that invariants are checked on
a state-by-state basis, and that each state is labelled with a set of
propositions, we can see each state as defining its own _induced interpretation_
in which any proposition it's labelled with is ``True``, and everything else is
``False``. Logicians call this the 'closed-world assumption' - essentially, only
the things we _say_ are true are actually true. Thus, we can 'spell out' our
invariant as a propositional wff, then just wander over the entire state graph
state-by-state and check that every induced interpretation asigns that wff the
truth value ``True``. Easy!

## Propositional logic interpreter - the simple version ##

Let's start off with a basic version, following the definitions pretty much
word-for-word.

    :::haskell
    data Wff a
      = Prop a
      | Not (Wff a)
      | And (Wff a) (Wff a)
      | Or (Wff a) (Wff a)
      | If (Wff a) (Wff a)
      | Iff (Wff a) (Wff a)
      deriving (Show)

The syntax defined above practically cries out for a Haskell-style algebraic
data type to enforce its rules. Given that we haven't specified anything about
what propositions _are_ (we've named them, but that's about it), we'll leave it
to the user to decide how to deal with them.

    :::haskell
    newtype Interpretation a = Interpretation 
      { runInterpretation :: a -> Bool 
      }

    truthValue :: Interpretation a -> a -> Bool
    truthValue = runInterpretation

Given that our propositions are allowed to be anything, an interpretation for us
is some way to map whatever we're using to represent propositions into ``Bool``.
This 'newtype wrapper' gives us a way of changing the implementation without
having to break anything, and also avoids exposing internals unless we want to.

Now we have all the pieces needed to write our interpreter.

    :::haskell
    interpret :: Interpretation a -> Wff a -> Bool
    
If we have a single proposition, we just ask our interpretation what to do:

    :::haskell
    interpret i (Prop x) = truthValue i x

For a 'not', we 'push' ``interpret`` to the wff being operated on, then flip the
outgoing truth value:

    :::haskell
    interpret i (Not phi) = not . interpret i $ phi

'And's and 'or's are dealt with similarly - we 'push' ``interpret`` to their
'child' wffs, then combine the results using ``(&&)`` and ``(||)`` respectively:

    :::haskell
    interpret i (And phi psi) = interpret i phi && interpret i psi
    interpret i (Or phi psi) = interpret i phi || interpret i psi

For 'if', we want to give back ``True`` except when the left 'branch' is
``True`` and the right branch is ``False``:

    :::haskell
    interpret i (If phi psi) = (not . interpret i $ phi) || interpret i psi

Lastly, for 'if and only if', we want to check if the truth values match:

    :::haskell
    interpret i (Iff phi psi) = interpret i phi == interpret i psi

... and that's it. This interpreter and the associated data structures work just
fine, and could be used as-are. The simplicity and ease of translation shown
here is why many (including myself) love Haskell.

However, these definitions are basic, and probably not as efficient as they
could be. Let's see how they can be improved.

## Interpretations shouldn't be functions ##

Our definition of an interpretation as (a wrapper around) a ``a -> Bool`` is
extremely general, and could be used to define a lot of interesting behaviour.
However, in our case, this isn't really needed, and in fact, would cause
complications, as we would have to create (potentially different) functions for
each state in our state graph when doing our checks. Because every state is
labelled with exactly those propositions that happen to be true there, we can
just as easily have our ``Interpretation a`` simply be a collection of ``a``s
which are true at the moment.

Let's start with lists, as this is easy. First, we change our ``newtype``
appropriately:

    :::haskell
    newtype Interpretation a = Interpretation i
      { runInterpretation :: [a] 
      }

We also have to adjust ``truthValue`` to look inside the ``Interpretation``'s
list; if it finds the second argument, then we give back ``True``, but not
otherwise. This requires us to introduce an ``Eq`` constraint as well.

    :::haskell
    truthValue :: (Eq a) => Interpretation a -> a -> Bool
    truthValue i x = x `elem` runInterpretation i

To get everything to compile again, we also need to add an ``Eq`` constraint to
``interpret``:

    :::haskell
    interpret :: (Eq a) => Interpretation a -> Wff a -> Bool
    -- rest of interpret unchanged

Now, we don't have to use functions for ``Interpretation``s anymore. While we
did add an extra constraint, limiting what we can use to represent propositions,
this is going to be much simpler to work with. This also shows why having the
``newtype`` wrapper around our interpretation implementation helps; any changes
we make to the implementation can be (mostly) invisible at the interface level.

## Speeding up interpretations ##

Lists are certainly useful, but they're _definitely_ not good for lookups.
Potentially, every time we want to call ``truthValue``, we might have to scan
the entire list before giving an answer! Additionally, if we have large wffs
being used to check large state graphs, we're going to be doing this scanning
over and over again, which is [definitely not a good idea][5]. Let's fix this
decision by thinking about it a little more and choosing a better data
structure.

If we think about how the model checking process will work in our application,
we expect the wff to be build _once_, but interpreted many times. Additionally,
we will have to do a lot more calls to ``truthValue`` than constructions of
``Interpretation``s: while the former will have to be done for every proposition
that shows up in our wff, the latter will only be done once per state. This
clearly requires a structure optimized for lookups, which ideally doesn't
constrain ``a`` too much. My suggestion would be something like ``HashSet`` from
[``unordered-containers``][6], as ``Hashable`` is an easy constraint to satisfy
for anything we could realistically want to use for propositions, and lookups
are fast.

The changes we need to make are quite minor:

    :::haskell
    import Data.Hashable

    import qualified Data.HashSet as HS

    -- Wff unchanged

    newtype Interpretation a = Interpretation
      { runInterpretation :: HS.HashSet a
      }

    truthValue :: (Eq a, Hashable a) => Interpretation a -> a -> Bool
    truthValue i x = x `HS.member` runIntepretation i

    interpret :: (Eq a, Hashable a) => Interpretation a -> Wff a -> Bool
    -- rest of interpret unchanged

Although we could have used ``elem`` (as it's provided by the wonder-weapon that
is [``Foldable``][7], which ``HashSet`` is an instance of), it's much less
efficient due to how ``HashSet`` is defined. We also need a ``Hashable``
constraint on ``truthValue`` and ``interpret``, but, as we mentioned before,
it's not a big ask for any realistic choice of representation.

## 'Compiling' wffs ##

So far, our changes have been simple, and we've kept the data type we use for
_building_ a wff and the data type we use for _interpreting_ the wff the same.
However, these two tasks are at cross purposes: the structure for building the
wff should be expressive and easy to use when defining arbitrarily complex wffs,
but the structure for interpreting should be as efficient as possible. Given
this, we'd like to separate these two concerns, and define another type which is
simpler, but can be interpreted more quickly. This is actually what compilers do
- they take something that's 'easy' for humans (to write and understand) and
convert it into something that's 'easy' for machines (to execute).

At the same time, when we do anything like this, we have to make sure that we
don't accidentally change the semantics - you can always get the wrong answer
infinitely fast, after all. To ensure that this doesn't happen, we're going to
use the concept of _logical equivalence_. Essentially, if we have two wffs Φ 
and Ψ, we say that they are _equivalent_ (which is written Φ ≡ Ψ) if they have
the same truth value under _any_ interpretation. This property means that we can
always replace Φ with Ψ (and vice-versa) without changing the semantics, no
matter the interpretation. There are many equivalences in propositional logic
(exercises on which being a major reason why many people end up hating
first-year logic classes, but I digress); we will use a few of them.

One way we can radically simplify things is to eliminate all our operators and
replace them with _one_ such that for any of our others, we can always rebuild.
Thanks to [Emil Post][8], we actually have _two_ options: either ↑ (called the
'Sheffer stroke' or ``NAND``) or ↓ (called the 'Quine dagger' or ``NOR``).
Although it really doesn't matter which one we go with, let's try the Sheffer
stroke. Instead of rewriting the equivalences that make it possible, [I'll defer
to Wikipedia][9] and let you look them up yourself.

Welcome back! Let's now implement a new data type to represent
Sheffer-stroke-only wffs, as well a function to interpret them.

    :::haskell
    data CWff a -- for 'compiled wff'
      = CProp a
      | Stroke (CWff a) (CWff a)
      deriving (Show)

    interpret' :: (Eq a, Hashable a) => Interpretation a -> CWff a -> Bool
    interpret' i (CProp x) = truthValue i x
    interpret' i (Stroke phi psi) = not (interpret' i phi && interpret' i psi)

Wow, _much_ simpler. However, one important step remains - we currently have no
way of converting a ``Wff`` into a ``CWff``. Something like this, perhaps:

    :::haskell
    compile :: Wff a -> CWff a
    compile (Prop p) = CProp p
    compile (Not phi) = Stroke (compile phi) (compile phi)
    compile (And phi psi) =
      Stroke
        (Stroke (compile phi) (compile psi))
        (Stroke (compile phi) (compile psi))
    compile (Or phi psi) =
      Stroke
        (Stroke (compile phi) (compile phi))
        (Stroke (compile psi) (compile psi))
    compile (If phi psi) = Stroke (compile phi) (Stroke (compile psi) (compile psi))
    compile (Iff phi psi) =
      Stroke
        (Stroke (compile phi) (compile psi))
        (Stroke
          (Stroke (compile phi) (compile phi))
          (Stroke (compile psi) (compile psi)))

Eeek. While this certainly works, even in Haskell it's not a pretty sight, and
writing this is thoroughly unpleasant. Despite the added simplicity in the data
type, the compilation can _massively_ inflate the size of the resulting
structure (just look at all those nested ``Stroke``s if you have any doubts).
While this is certainly easier to interpret in terms of the number of rules
required, it's not looking like it'll be any faster.

## Where to from here? ##

Our ridiculous over-engineering needs to continue in order to make all this
worthwhile. Join us next time, when we get close to writing a compiler in
the pursuit of Ever Greater Speed. Meanwhile, feel free, as some of my
students did, to appreciate how straightforward Haskell makes writing such
things. Think hard and have fun!

[1]: https://mitpress.mit.edu/books/principles-model-checking
[2]: https://en.wikipedia.org/wiki/Model_checking
[3]: https://www.springer.com/gp/book/9781447141280
[4]: https://en.wikipedia.org/wiki/Propositional_calculus
[5]: https://www.imn.htwk-leipzig.de/~waldmann/etc/untutorial/list-or-not-list
[6]: https://hackage.haskell.org/package/unordered-containers-0.2.9.0
[7]: https://retro-freedom.nz/haskell-magic-from-semigroup-to-foldable.html
[8]: https://en.wikipedia.org/wiki/Functional_completeness#Characterization_of_functional_completeness
[9]: https://en.wikipedia.org/wiki/Sheffer_stroke#Other_Boolean_operations_in_terms_of_the_Sheffer_Stroke
